Wordcount

The workcount() function is already defined and complete. It calls print_words() and print_top() functions which you write.

For the count option, implement a print_words(filename) function that counts how often each word appears in the text and prints: word1 count1 word2 count2 ...

Print the above list in order sorted by word (python will sort punctuation to come before letters -- that's fine). Store all the words as lowercase, so 'The' and 'the' count as the same word.

For the topcount option, implement a print_top(filename) which is similar to print_words() but which prints just the top 20 most common words sorted so the most common word is first, then the next most common, and so on.

Use str.split() (no arguments) to split on all whitespace.

Workflow: don't build the whole program at once. Get it to an intermediate milestone and print your data structure. When that's working, try for the next milestone.

Optional: define a helper function to avoid code duplication inside print_words() and print_top().


In [ ]:
import sys

def wordcount(option, filename):
    if option == 'count':
        print_words(filename)
    elif option == 'topcount':
        print_top(filename)
    else:
        print 'unknown option: ' + option

In [ ]:
# +++your code here+++
# Define print_words(filename) and print_top(filename) functions.

In [ ]:
wordcount('count', 'data/poem.txt')

In [ ]:
wordcount('topcount', 'data/poem.txt')

In [ ]:
wordcount('count', 'data/wiki.txt')

In [ ]:
wordcount('topcount', 'data/wiki.txt')

In [ ]:


In [ ]:


In [ ]:


In [ ]:

Note: This notebook is an adaption of Google's python tutorial https://developers.google.com/edu/python


In [ ]: